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ABSTRACT 

We apply Support Vector Machines - a machine learning algorithm - to the task of classifying 
structures in the Interstellar Medium. As a case study, we present a position-position velocity data 
cube of 12 CO J=3-2 emission towards G16.05-0.57, a supernova remnant that lies behind the M17 
molecular cloud. Despite the fact that these two objects partially overlap in position-position- velocity 
space, the two structures can easily be distinguished by eye based on their distinct morphologies. 
The Support Vector Machine algorithm is able to infer these morphological distinctions, and associate 
individual pixels with each object at >90% accuracy. This case study suggests that similar techniques 
may be applicable to classifying other structures in the ISM - a task that has thus far proven difficult 
to automate. 

Subject headings: ISM: supernova remnants — ISM: individual objects (G16.05-0.57, M17) — tech- 
niques: image processing 



1. INTRODUCTION 

Classifying interesting objects in a dataset is an es- 
sential early step in most analysis tasks. For many 
objects in astronomy (stars, galaxies, solar system ob- 
jects, extragalactic supernovae), algorithms can identify 
and characterize these objects au tomatically (jlrwinll 19851 : 
iBertin fc Arnoutslll996l : lNavlorl[l99cl . Structures in the 
Interstellar Medium (ISM), however, have proven dim- 
cult to classify in this way. These objects - which in- 
clude, e.g., molecular clouds, infrared dark clouds, bub- 
bles, jets, radiation-shaped pillars, and filaments - are 
morphologically complex and heterogeneous. The essen- 
tial properties of these structures are hard to encode. 

To take advantage of increasingly wide-area surveys, 
previous researchers have mainly relie d on manual iden- 
tification of features in the ISM (|Churchwell et al. 2006; 
iHelfand et "aTll2006l : iCurtis et al.llToiOUArce et al.ll201QD . 
There are several drawbacks to this approach: it is time 
consuming, non-repeatable and, when identifying com- 
plex structures, affected by difflcult-to-quantify selection 
effects related to how specific people perceive an im- 
age. Despite recent advances in t he study of specific ob- 
jects (e.g. Infrared Dark Clouds, iPe retto fc Fullerll200l 
filaments. iMen'shchikov et al.ll2010l ). automated feature 
identification in the ISM is an still an open problem. 

Machine learning algorithms are designed to infer pat- 
terns in data which are otherwise difficult to define ex- 
plicitly. They are grouped into two classes: supervised 
(in which the algorithm is "trained" to recognize a pat- 
tern via a set of training examples) and unsupervised (in 
which the algorithm identifies groupings within a dataset 
a priori). 

These algorithms can mechanize the process of object 
identification, and hence address many of the shortcom- 
ings of manual classification - they scale easily to other 
similar data, and their results are repeatable. A potential 
drawback of these methods is that, because the classifi- 
cation is not guided by a physical model, they are sus- 



ceptible to over-fitting and to inheriting selection biases 
within the training data. However, because the machine 
approach extends easily to other similar data, these bi- 
ases are more readily characterized via classification of 
test data sets. 

In this paper, we explore whether machine learning al- 
gorithms can be used to catalog structures in the ISM. 
We present as a case study a spectral line data cube 
of 12 CO emission towards the Ml 7 star forming region. 
Emission from this cloud overlaps with G16.05-0.57, a su- 
pernova remnant situated behin d the cloud. We use the 
Support Vector Machine (SVM, Vapnik 1995) algorithm 
to identify the supernova remnant and, on a pixel-by- 
pixel basis, classify the original data cube. We are able to 
use this classification to derive the mass and momentum 
of the supernova remnant, which would not otherwise be 
possible with these data. 

2. THE DATA 

G16.05-0.57 is a supernova remnant in the inner galaxy 
first discovered via its cm synchrotron emission by Bro- 
gan et al. (2006). Because it is located behind M17, 
we serendipitously observed the remnant with the James 
Clerk Maxwell Telescope (JCMT) during an imaging 
campaign of the latter region. Our observations of 
G16.05-0.57 were taken on the nights of 200 9 June 23-25, 
using the HARP heterodyne receiver array ([Smith et al.l 
I2008f ). The observations target the 12 CO J=3-2 line, 
which traces moderately excited {hvjk — 16K) and dense 
(^crit ~ 10 3 cm -3 ) gas at a resolution of 15". The 
data were acquired via position- switched raster scans, us- 
ing a reference position of (£, 6) = (15.5°, —2.4°). Basket- 
weaving was used to reduce striping artifacts in the final 
map (see, e.g., Section 2.1 of Davis et al.ll201Qf ). Weather 
conditions were grade 2-3 (225 GHz opacity T225 = 0.06- 
0.10). To convert from antenna temperature T| to main 
beam temperature (T m ^ = T^ = T% /rj) we assum ed an 
aperture efficiency of rj = 0.63 (Buckle et al.ll2009[ ) 

G16.05-0.57 is visible as a filamentary arc subtending 
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Fig. 1. — The G16.05-0.57 supernova remnant. The greyscale shows the intensity of our CO J=3-2 map at several velocities, while the 
blue contours trace the 20cm continuum emission presented in Brogan et al (2006). The contour lines are at 1, 1.5, and 2cr above the 
background. The compact, filamentary emission especially noticeable at 49kms — 1 is from the supernova remnant. The diffuse emission in 
the first three panels is from clouds in the Scutum arm, while the bright emission at 19 km s — 1 is part of the M17 molecular cloud. 



20 arcminutes on the sky (Figured}. To our knowledge, 
these are the first molecular line observations of this ob- 
ject. Features belonging to the supernova have typical 
linewidths of 15 km/s FWHM, and are centered on a 
wide range of velocities from -5 - 90 km/s. It is im- 
possible to derive a kinematic distance from such broad 
and scattered lines, but supernova features are absorbed 
by overlapping cloud emission at 40 km/s. This corre- 
sponds to a near kinematic distance of 3.4 kpc, coincident 
with the Scutum galactic arm. The fact that G 16. 05-0. 5 7 
emits CO line emission suggests that the remnant is in 
the process of colliding with another molecular cloud, 
and we are observing the shocked interface of this colli- 
sion (jScoville et al.l 1 19771 : Ivan Dishoeck et al.lll993f ). 

The morphology of structures in this datacube make 
for an appropriate case study of automated feature clas- 
sification in molecular line datasets. A representative 
position- velocity slice through the data cube (Figure [2} 
shows that the remnant overlaps molecular cloud emis- 
sion from Ml 7 and the Scutum arm. However, the su- 
pernova has markedly different structure; features from 
the remnant are spatially compact and extend over tens 
of km/s in velocity space, while the other structures it 
overlaps with are more spatially diffuse and kinematically 



narrow. This suggests that a learning algorithm may be 
able to distinguish emission from G16.05-0.57 based on 
its unique morphology. 

3. SUPPORT VECTOR MACHINES 

The Support Vector Machine algorithm is a super- 
vised learning algorithm which attempts to segregate 
data points into two categories, based on a repre- 
sentative sample of data belonging to each category. 
The method has recently been applied to many di- 
verse proble ms in astron omy, including redshift esti- 
mation (Wang et al. 2008), galaxy morphology identifi- 
cation (Huertas-Companv et al. 2008), and time series 
analysis (Ki m et al.l 1201 ll ). Here we provide a basic 
overview of the algorithm, but refer the reader to Press 
et al. (J2007I ) f or a d eeper and more precise derivation, 
and to Vapnik (1999) for a discussion of the algorithm's 
foundation in statistical learning theory. In what fo llows , 
we use the SVM hfir/u implementation by Joachims (1999). 
While this code is written in C, we have written a set of 
wrappers to use these tools within IDL. 

During training, the SVM algorithm takes as input a 
feature vector for each training example - a set of N 
quantities that describe the discriminating properties of 
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Fig. 2. — A position- velocity slice through the data cube, man- 
ually labeling some of the emission from the supernova remnant 
(blue) and foreground clouds (red). The inset shows a position- 
position view of the cube, and the red line shows the location of 
the PV slice. The bright feature at 20 km/s is the M17 cloud, 
while the emission at 40 km/s is from the Scutum Arm. Individ- 
ual supernova emission features are distinguished by their narrow 
spatial extent and broad velocity profiles. Shocked emission from 
the expanding supernova is detected at all velocities throughout 
the cube. As a result, it overlaps other structures in several places. 



that object. These numbers can be thought of as coordi- 
nates for a vector in N- dimensional feature space. Using 
a training set of pre-classified feature vectors, the algo- 
rithm searches for a decision boundary in feature space 
that optimally separates examples from each category. 
New data points are then assigned a classification based 
on the side of the decision boundary on which they fall. 
More specifically, SVM seeks a decision boundary B 
that maximizes a fitness function F given by 



F = M-C ^i(B,M) 



(1) 



Here M is the margin of the boundary; SVM attempts 
to separate training data with a large gap between each 
data point and the boundary, and M defines the size 
of this gap. £i(B,M) is the degree to which training 
example i violates this criterion. If example i is separated 
by more than M from B (and on the correct side), & = 
0. Otherwise, & is the distance that i would have to 
be moved to satisfy this condition. The adjustable cost 



parameter C sets a tradeoff between large margins M 
and poor classifications &. 

For a better understanding of Equation [TJ consider 
first the case where C is very large, the feature space 
is 2-dimensional, and the decision boundary is restricted 
to a line (Figure [3j where the different symbols denote 
training examples from two different classes). In this sce- 
nario, the algorithm first optimizes Equation [T] over M 
for a fixed boundary. Since C is very large, even a sin- 
gle training example on the wrong side of the margin will 
heavily penalize Equation^] Thus, the optimal M will be 
the margin that just touches the training example closest 
to B (i.e., the largest M that satisfies ^^(B.M) = 0). 
The algorithm repeats this optimization over all bound- 
aries, finding the plane that can accommodate the largest 
margin. The final classification is illustrated in Figure [3j 
the background shading shows how the algorithm classi- 
fies feature space. The dotted line traces the margin on 
either side of the boundary. 

Next, consider the impact of reducing C. Individual & 
terms now penalize Equation [T] less heavily. The optimal 
boundary in this scenario may be one which misclassifies 
a small number of outliers, but can afford to partition the 
remaining data with a larger margin M . This is depicted 
in Figure HI 

In addition to C, a second set of adjustable parame- 
ters characterize a "kernel function". The kernel func- 
tion determines the topology of the decision surface. In 
the simplest case, decision boundaries are hyper-planes 
in feature space. In this paper, we use the radial ba- 
sis kernel function (RBF), a popular and effective kernel 
that allows for non-planar decision boundaries. The RBF 
kernel has one adjustable parameter, 7, that controls the 
curvature of the decision surface. For example, Figure 
[5] shows an SVM classification of a data set with three 
different values of 7. Low values lead to stiff bound- 
aries, while high values lead to curved boundaries that 
may over-fit to the training data. In this application, 
over-fitting refers to the case when the decision boundary 
conforms too tightly to the individual training examples, 
and the larger-scale organization of the data is ignored. 

A small complication arises when different elements of 
the feature vector have different scales. Components of 
the feature vector with the very large numerical values 
will dominate £, and the remaining components will have 
a negligible effect on the classification. To circumvent 
this, we normalize each element of the feature vectors, 
such that the dispersions of each element across the data 
are equal. 

From a practical standpoint, then, an SVM-based clas- 
sification task involves four steps: 

1. Manu ally classify a representative subset of the 
data (g3TI]). 

2. For each classified example, create a feature vec- 
tor. This vector should encode the properties of an 
object that make it identifiable (S 



3. Choose a kernel function. In this study, we restrict 
our attention to the radial basis function. 

4. Train the algorithm, and optimize the classification 
by adjusting free parameters (C and 7 in this case; 
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Fig. 3. — A simple classification task in two dimensions. Dots 
and Xs represent training examples from two classes. Each training 
example consists of a 2-dimensional feature vector, represented in 
the figure by the position on the plane. The grey and white areas 
denote which regions of feature-space an SVM classifier assigns to 
each class. The SVM algorithm seeks a decision boundary which 
maximizes the distance between training data and the boundary 
(the margin, shown here as dotted lines). 
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Fig. 4. — An illustration of how the cost factor C influences 
the SVM classification. The symbols and colors are the same as 
in Figure [3] The left and right figures show classifications with 
low and high values of C. Decreasing C decreases the penalty for 
mis-classifications near the boundary, increasing robustness against 
outliers. 
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Fig. 5. — An illustration of how the RBF kernel parameter 7 in- 
fluences the SVM classification. As 7 increases, the decision bound- 
ary becomes more curved, and conforms more tightly to individual 
training examples. The margin in the rightmost plot wraps very 
tightly around the Xs, and is not drawn. 

In what follows we apply these steps to disentan- 
gle overlapping emission from the supernova remnant 
G 16. 05-0. 5 7 and foreground molecular clouds. 

3.1. Manually classifying data 

The success of any supervised learning algorithm is 
limited by how representative the training set is. Because 
our aim is a pixel-by-pixel classification of the data, our 
training set consists of a subset of these pixels, manually- 
classified as associated with the SNR (or not). We carried 
out this manual identification on subsets of four position- 
velocity slices through the cube. The classification of one 
of the planes is shown in Figure [2j In total, the training 



set explicitly labels 0.4% of the pixels in the cube (5% 
of t he pixels above 3<r). However, as we show in Section 
14. 1( only a small fraction ( 5%) of these examples are 
ultimately necessary. 

3.2. Creating a feature vector 

Each pixel in our training set must be assigned a fea- 
ture vector - a list of numerical attributes that dis- 
tinguish between the supernova and unassociated fore- 
ground objects. When classifying a pixel in the data by 
eye, it is sufficient to examine the pixels in the imme- 
diate vicinity. In particular, a 30 x 30 x 100 pixel sub- 
cube in PPV space is sufficient for a human to classify 
the pixel in the center of that cube. At the presumed 
3.4 kpc distance to the supernova, this corresponds to a 
3pc x 3pc x 25 km s~ region. In principle, one could use 
the intensities of these 9 x 10 4 pixels as a feature vector 
for the central point. In practice, such a large feature 
vector is prohibitively slow. We tested three strategies 
to compress this information. Each of these strategies 
defines a different feature vector: 

Moment. For each pixel p^, we extract the surround- 
ing 30 x 30 x 100 pixels. We calculate the mean intensity 
of this cube, and the first and second moments along each 
direction through the data. These seven numbers con- 
stitute the feature vector for pi. Relative to other cloud 
emission, supernova features have large velocity disper- 
sions and small spatial dispersions - this information is 
encoded in the moments of the data. 

Derivative. Spatial derivatives are sensitive to 
edges in images, and such information can be used 
to identify fila mentary structures in astronomical data 
(Molinari et al. 2010). We generate a feature vector that 
encodes this information. We approximate the gradient 
in each direction and pixel location using the Sobel edge 
detection operator. To generate the feature vector for 
pixel pi = (xo, yo , ^0), we sample profiles of each deriva- 
tive along each direction through the pixel: 



Pi = {d x (x + 5,y ,z ) 
P2 = {d x (x ,yo + 5,z ) 
P3 = {d x (x ,yo,z + 5) 



15 < 5 < 15} 
15 < 6 < 15} 
50 < S < 50} 



P9 = {d z (x , yo,z + 5) | - 50 < £ < 50} 

Pfinal = {PlUP 2 U---UP 9 } 

For convenience, we further down-sample Pfinai to 60 ele- 
ments, which defines the feature vector. We determined 
the degree of downsampling that was appropriate by ex- 
amining the widths of typical features in the derivative 
profiles by eye. Nevertheless, this smoothing may lead 
to worse performance. 

PCA. We approximate the 30 x 30 x 100 sub-cube 
around each pixel pi as a linear combination of 15 rep- 
resentative "basis cubes" . We derive th e basis cubes us- 
ing p rincipal component analysis (PCA, iFrancis fc Willsl 
1999), and these basis cubes capture ~ 95% of the vari- 
ance in the data. The 15 weights in the linear com- 
bination define our final feature vector for p^. This is 
essentially a (lossy) compression of the data and, unlike 
the first two methods, does not explicitly encode any 
intuitive, identifying characteristics. Nevertheless, this 
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expression of the data has proved useful in other clas - 
sification tasks (e.g., as teroid taxonomy flThole n 1984), 
stellar spec tral types (|Singh et al.l [1998), star/galaxy 
separation (jCabanac et al.ll2QQ2f )). PCA has also been 
used to d ecomp ose and analyze molecular clou d struc- 
ture (jHever fc Schloerblll997l : iBrunt et al.ll2009D . 

3.3. Training and Optimization 

As discussed above, two free parameters influence the 
training process: 7 and C. We use cross-validation to 
choose optimal values for these parameters. We first par- 
tition our classification examples into two independent 
sets. The first (the training set) is used to train the clas- 
sifier using a given value for (C, 7). We then apply the 
classifier to the second (validation) data set, and mea- 
sure the accuracy of the identification. We repeat this 
process for different values of (C, 7) to maximize the 
performance on the validation set. This approach pro- 
vides some protection against over-fitting, since over-fits 
to the training data will poorly classify the validation set. 
To maximize the independence of the training and data 
set, the two samples were drawn from different regions 
of the cube. 

4. RESULTS AND DISCUSSION 

4.1. Classification Performance 
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We evaluate the performance of each classifier by com- 
paring the accuracy with which it classifies the valida- 
tion data described in §3.3! The accuracy is simply the 
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Fig. 7. — The SVM classification of the same data shown in 
Figure [2] Blue pixels are classified as belonging to the supernova, 
while red pixels denote foreground cloud emission. 



fraction of correctly-labeled pixels. We find that the fea- 
ture vector which encodes the moments of the intensity 
achieves the highest performance. 

The maximum accuracies achieved using the Moment, 
Derivative, and PCA feature vectors were 83%, 77% and 
75%, respectively (the y intercept of Figure [6]). How- 
ever, misclassifications are biased towards low-intensity 
pixels. This is to be expected since, as Figure [2] shows, 
pixels corresponding to blank sky have been included in 
both classes in the classification set. Hence, the proper 
classification of these faint pixels is not well defined. Mis- 
classifying noise is not problematic, however, since simple 
thresholding later in the analysis can separate signal from 
background. For most data analysis purposes, it is more 
important that emission features be correctly classified. 

Figure [6] shows the accuracy at which each classifier 
identifies pixels above a given intensity threshold. Here, 
the moment-based classifier has an accuracy of 90% for 
emission detected at 3<r, and exceeds 95% accuracy for 
the brightest pixels in the data. Figure [7] shows the clas- 
sification of the data in Figure [2j using the Moment fea- 
ture vectors. Figure [8] shows the classification of sev- 
eral position-position channels. Many of the misclassi- 
fied bright pixels lie near the boundary between super- 
nova and cloud emission. This is perhaps expected, since 
the moments of the intensity are measured (and implic- 
itly smoothed) over a 30 x 30 x 100 sampling window. 
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Fig. 8. — The same as Figure [3 but for the position-position slices through the cube also shown in Figure ^ An animation showing the 
full classification can be found in the electronic version of this article. 



Nevertheless, several supernova lines intersect and, pre- 
sumably, extend behind cloud material. In most cases, 
the SVM classifier at least partially follows these transi- 
tions from supernova to cloud. 

There are three reasons why an SVM classifier mis- 
classifies data. First, data from two classes may not seg- 
regate perfectly in feature space (due either to noise in 
the data, errors in the training set, or a poorly-designed 
feature vector). Second, the topology of the decision 
boundary (determined by the SVM kernel function) may 
not be able to conform to the distribution of training data 
in feature space. Finally, the training examples may in- 
sufficiently sample how data are distributed in feature 
space. Each of these possibilities has implications for 
how to design and improve classification pipelines. 

Figures [9] and [10] provide some insight into what limits 
the performance of our classification using the Moment 
feature vectors. Using the optimal values of C and 7 
found above, we measured the learning rate - the accu- 
racy of the classification as a function of training set size. 
Figure [9] shows this function, and suggests that most of 
the meaningful information is contained within the first 
few hundred examples. Remaining examples contain re- 
dundant information, and confer little performance gain. 

Misclassifications in the validation set may occur be- 
cause the training set isn't representative enough, and 
the validation set samples a systematically different re- 
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Fig. 10. — The accuracy of the classifier using the Moment feature 
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set. The solid line re-optimizes C and 7 at each step. The dashed 
line fixes these parameters at their original optimal values. The 
dotted line is the trivial case of over-fitting, when each correction 
example causes the algorithm to correctly identify one additional 
example in the test set. 

gion of feature space. To test this possibility, Figure [T0l 
shows the classification accuracy when we re-train using 
the original training set, augmented with a subset of the 
validation data that were originally mis-classified. The 
solid line re-optimizes C and 7 at each step. The dashed 
line shows the result when we fix these parameters to the 
values used above. 

Note that in this experiment, part of the validation 
data is now explicitly included in the training set. This 
increases the risk of over-fitting the training data (i.e. 
devising arbitrary rules that fit the training data, but 
which do not generalize well to new data). As a trivial 
case of over-fitting, when a classifier is re-trained with a 
correction example in the validation set, it corrects the 
misclassification for that example only. The dotted line 
in Figure [TQl depicts this scenario. Any meaningful per- 
formance gain should fall above this line, since the classi- 
fier should ideally use the information in each correction 
example to correct many additional mistakes. 

The figure shows that, even when presented with ad- 
ditional correction examples, the algorithm shows essen- 
tially no performance gain. This figure rules out the 
possibility that misclassifications in the validation set are 
due to the training and validation sets sampling differ- 
ent regions of feature space. Instead, this strongly sug- 
gests that the classification task is limited by the partial 
overlap of the two classes in feature space. Additional 
training and correction examples are of little help in this 
situation, and a better feature vector is needed for fur- 
ther performance gain. 

We do not claim that these classifiers will necessar- 
ily generalize to other data sets or classification tasks. 
Other applications likely require re-training the SVM al- 
gorithm using the data at hand, or testing new feature 
vectors. However, Figure [6] suggests that the SVM algo- 



rithm is capable of identifying morphological differences 
in the ISM, and Figure [9] implies that this task can be 
taught efficiently, with little manual classification. For 
example, we have started to investigate whether wind- 
blown bubbles can be identified using Spitzer colors and 
edge information as a feature vector (Beaumont et al. in 
prep.) 

4.2. Mass and momentum of SNR G1605-0.57 

As mentioned above, the CO emission from G 16.05- 
0.57 is likely due to the remnant's collision with a molec- 
ular cloud, presumably in the Scutum galactic arm at 3.4 
kpc. Our pixel-level classification of the data allows us to 
analyze the properties of this emission in isolation from 
foreground material. Here we derive an estimate of the 
mass and momentum associated with the cloud/remnant 
collision. 

In the limit that all material along the line of sight can 
be described by a single excitation temperature T exi the 
equa tion for the obse rved radiation temperature Tr is 
(jGinsburg et al.ll20TTh 



T R = T 
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where To = hv /k = 16. 6K for CO 3-2, r is the optical 
depth, and / is the beam filling factor (which we take to 
be 1). Furthermore, the column density of the J=3 state 
is given by 
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where A32 is the Einstein A coefficient. Equation [2] can 
be solved for r, such that Equation [3] explicitly depends 
only on T ex and Tr(v) (see, e.g., Equation A8 of Gins- 
burg et al. I2011f ). 

To constrain the excitation temperature and opacity in 
the line centers, we obtained supplementary 13 CO J=3- 
2 observations towards three bright knots of emission. 
Assuming that the filling factor of the gas and excitation 
temperature of the two CO isotopologues are the same, 
their intensity ratio gives the gas opacity: 



[12 _ v\2 1 ~ exp[-ri 2 ] 
Hl3 ^13 1 - exp[-Ti2X 13 / 12 ] 



(4) 



where X 13 / 12 is the abundance ratio of 13 CO/ 12 CO, 
which we take to be 70. Figure [IT] shows the inferred 
opacity for all pixels where we detect emission from both 
isotopes. The typical optical depth is 3-5. 

Plugging r = 4 into Equation [2] gives an estimate 
of the excitation temperature along each line of sight, 
which we find to be T ex = 15 — 30K. This in turn al- 
lows us to evaluate Equation 02 Finally, we convert from 
N(J = 3) to N(CO) assuming the population levels are 
thermalized, and to N{H2J using an abundance ratio 
X(CO) = N(CO)/N(H 2 ) = 10- 4 . The abundance of 
CO within shocks is uncertain. In their study of CO and 
H 2 vibrational lines in the C-t ype shocks of the Orion 
KL region, Watson et al. (1985) measure an X(CO) of 
1.2 x 10 -4 . On theoretical grounds, the value of X(CO) 
for dissociative shocks may be enhanced by up to a fac- 
tor of 100 if the re-formation of H2 in post-shock gas is 
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Fig. 11. — Normalized histogram of the 12 CO opacity inferred 
from Equation 3] in regions where both 13 CO and 12 CO are de- 
tected. 



less efficient than CO (jvan Dishoeck et aTlll992f ). Thus, 
the actual value of X{CO) depends on both the type and 
strength of shocks in G16.05-0.57, as well as the micro- 
physics of grain catalysis in shocked gas. 

Figure [12] presents the column density map of the 
supernova remnant obtained from this analysis. The 
angularly-integrated column density of the map is 
J N(H 2 )dQ = 1.1 x 10 16 cm -2 ster. If we further assume 
that the remnant is located within the Scutum arm at 
3.4 kpc, this implies a total mass of M = 2300 M . We 
approximate the velocity of the gas at each point by the 
observed velocity dispersion - this is a lower limit, since 
it only accounts for motion along our line of sight. Nev- 
ertheless, this implies a total momentum of 2.2 x 10 4 
M km s _1 . For comparison, the typical momentum of 
a supernova explosion is ~ 1 — 10M Q x 10 4 km s~ ~ 
lO 4 - 5 M kms _1 . 

The characteristic width of filamentary features in the 
remnant is 30" -60" = 0.5-1 pc. Taking this to be the 
line-of-sight depth of supernova emission implies a vol- 
ume density of n = N/l ~ 500 - 1000 cm -3 . This is 
roughly 1-2 orders of magnitude lower than the densi- 
ties van Dishoeck et al. (|1993f ) measured towards the 
supernova remnant / molecular cloud collision IC 443. 
The most likely explanation of this discrepancy is that 
the filamentary features in our data consist of unresolved 
filamentary substructure. If this is the case, then the 
characteristic depth I would be smaller, and the corre- 
sponding volume density higher. However, this would 
not affect our mass and momentum measurements, since 
the increase in volume density is cancelled out by the 
decrease in solid angle. 

5. CONCLUSION 

We have presented a case study of a supervised learn- 
ing task applied to classifying structures in the ISM. The 



Fig. 12. — The column density map of G16.05-0.57. The scale 
bar assumes a distance of 3.4 kpc. 

Ml 7 molecular cloud overlaps shocked CO emission from 
G16. 05-0. 57 but, because of the supernova's distinct mor- 
phology in position-position-velocity space, the two ob- 
jects are readily distinguishable by eye. The SVM clas- 
sification algorithm is able to learn these morphological 
differences using a representative sample of manually- 
classified pixels. We emphasize several important char- 
acteristics of this approach: 

1. Machine-based classification of ISM structures per- 
mits a pixel-level classification of datasets. This 
level of refinement is often prohibitively cumber- 
some via manual identification. 

2. By using an independent set of manually-classified 
validation data, we can characterize the quality of 
this classification. We can further use this infor- 
mation to refine and improve the algorithm's per- 
formance. 

3. Extracting information about the moments of the 
intensity distribution in our data produced the 
most effective classification. Other information 
(the weights in a principal component analysis, spa- 
tial derivatives) was less successful. 

4. Only a very small fraction of the data (~ 0.1%) 
needs to be categorized to train the algorithm. An 
efficient approach for future work may be to evalu- 
ate the classifier's performance as the training set is 
assembled, to better understand when the training 
set is large (and representative) enough. 

This case study suggests that automated algorithms 
are capable of identifying complex structures seen in the 
ISM. Such an approach may be useful in analyses of cur- 
rent and future surveys of the Milky Way's ISM, partic- 
ularly when identifying morphologically distinct struc- 
tures like bubbles, pillars, and filaments. 
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